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Abstract 

Martingale convergence theorems, in particular, laws of large numbers, are im- 
portant tools in probability and statistics. In this paper, we establish a mean-square 
law of large numbers for martingale arrays, with easy-to-verify conditions that al- 
lows smaller-than-usual normalizers. Two applications in nonparametric statistical 
estimation are considered, namely, kernel regression for Markov chains and Bayesian 
nonparametric density estimation. For the latter application, we give a convergence 
rate result for the Bayes density estimator, and investigate behavior of the posterior 
distribution under only support conditions on the prior. 

Keywords and phrases: Bayesian asymptotics; density estimation; Hellinger dis- 
tance; Markov chain; predictive density; regression. 



1 Introduction 

Martingales are one of the most important classes of stochastic process es in moder n 
probability. For a systematic study of martingales and their properties, see iDoobl (Il953f ). 
Martingales have many applications in both probabili ty and statistics. For d iscrete-time 
martingales, a number of applications are presented in lHall and Hevdd ( 119801 ); Sections |3] 
and H] present two more statistical applications. In continuous-time there is an important 
fact that the stochastic integral of a nice integrand with respect t o a Brownian motion is 
a martingal e (IKaratzas and Shrevelll99ll ). This, and the work of iDellacherie and Meyer 
( 119781 . Il982l ). has led to one of the most famous and beauti ful applications of martingale 
theory to finance, the Black-Scholes option pricing theory (jShiryaevlll999l ). 
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In the first part of this paper, we prove a law of large numbers for discrete-time 
martingale arrays. The proof is simple, the conditions are easy to verify, and it allows o{n) 
n ormalizers, wher e n is the sample si z e. Ou r res ult is a n i ce com plement to the theorems 



m lAtchadeJ ( 120091 ) . lAtchade and Fortl (120101 ) . and lTeicherl (119981 . e.g., Corollary 2). Indeed, 
our result gives L 2 convergence while that the others focus on almost sure convergence, 
and Teicher's results are for ordinary martingale sequences, not arrays. 

In the second part of the paper, we consider two applications of our martingale con- 
vergence result. The first application, in Section |3j is in kernel regression estimation 
for dependent-data sequences. In particular, when covariate and error terms are mod- 
eled as a bivariate Markov chain, we use our martingale law of large numbers to prove 
that, under suitable conditions on the kernel and underlying Markov chain, the famous 
Nadaraya-Watson kernel estimator is consistent in an L 2 sense. 

The second application, in Section HI is Bayesian nonparametric density estimation. 
Martingale methods have been used to investigate convergence of Bayesian posterior dis- 
tributions, dating back to iDoob (Il949[ ). Martingale laws of large numbers , in part i cular , 



have r eceived some attention in the posterior consistency literature; see IWalkerl ( 12003 



2004aU bl) and lGhosal and Tangi ( 120061 ). Despite their success in the posterior consistency 



problem, martingale methods have been relatively unexplored in the more challenging 
posterior convergence rates problem. A key observation is that the usual normalizer n, 
the sample size, is too large for studying rates of convergence for Bayesian quantities. 
Our general martingale law of large numbers carries o(n) normalizers, so we are equipped 
to tackle such problems. In particular, give a new proof of a result of iBarronl ( 119991 ) 
which says that, under a suitable local support condition on the prior, a Cesaro average 
of the predictive densities converges at a certain rate. That is, only local conditions on 
the prior are required to ensure that the predictive densities are suitably well-behaved 
asymptotically. We also investigate properties of the posterior distribution itself based on 
our martingale method, though the approach seems to fall short of establishing a proper 
posterior convergence rate theorem. 



2 A martingale law of large numbers 



Fix a probability space (Q, &/, P), where stf is a a-algebra on Q and P is a probability 
measure. For each n>l, let (X Uj i, ■s^i)\<i< n be a square-integrable martingale difference 
array defined on Q. That is, ^ is a non- decreasing sequence of sub-cr-algebras contain in 
with £/q = {0, Q}, and, for each n > 1, X n ^ is an ^-measurable random variable with 
E(X^) < oo and E(X n>i \ s^i-x) = 0. Then for each n, (M n>k , £f k ), with M n , k = Y^i=i X n,i, 
k — 1, . . . , n, is a martingale sequence. The goal is find conditions on {X n> i) such that 
M nt n/u n — > in some sense, for a suitable sequence of numbers (cu n ). 

Laws of large numbers for martingale arrays are apparently not so common. The case 
X n ^ = Xi is more common, and a well-known result on the stability of the partial sum 
M n = Y^i=i Xi is presented in, e.g.,|LoeveJ (119631 . p. 387). Thi s result s ays that M n / n — > 



almost surely if J2 n n 2 E(^D < oo. This result was used by IWalkerl (120031 . l2004al ) in his 



exploration of Bayesian posterior consistency. It turns out, however, that no r maliz ing; by 
n is not appropriate for the study of posterior convergence rates. iTeicherl (Il998f ) gives 
laws of large numbers for basic martingale sequences — not arrays — with o(n) normalizers; 
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however, his conditions do not seem appropriate for the present context. 

The following general result gives sufficient conditions for the normalized partial sum 
sequence M n ^ n /uj n to converge to zero in a couple different modes. Observe that uj n = o(n) 
is possible under these conditions. 

Proposition 1. Let (M n ^^k) be a martingale array as described above, and (u n ), (a n ) 
positive, increasing sequences of numbers such that u) n A a n — > oo. If 



a 

" 4=1 



VE(X 2 )=0(1), twoc, (f) 



then M n ^ n /uj n — > in L 2 - Also, maxi<fc<„ \M n> k\/uj n — >• in probability. 

Proof. Since, for each n, (X n> i) is a martingale difference sequence, a simple calculation 
with iterated expectations reveals that 

1 n 

E\M nn /uj n - 0| 2 = E« B )/a£ = £ E(X^). 



According to ([T]), the last term is 0(a n 1 ), which goes to ze ro since a n o o. This proves 
L 2 convergence. For the second claim, Theorem VII. 3. 3 in IShiryaevI (119961 ) gives 



EfM 2 ) 

P(max |M W|fc | > \tu n ) < \ 2 T ; , V A > 0. 



Then the established L 2 convergence shows that the upper bound vanishes as n — > oo, 
proving the second claim. □ 



3 Kernel regression for Markov chains 
3.1 Setup and notation 

Consider a (joint) Markov chain {(Xi, Zi) : i > 0} with state space M 2 satisfying 
P{(X n ,Z n ) EAxB\ (X n _i,Z n _i)} = / p{X n - 1 ,x)q{x,B)dx, 

J A 

where p is transition density for (Xi) and q(x, B) = P(Z n G B \ X n = x) is the conditional 
distribution of Z n , given X n = x. Assume the transition p admits a stationary distribution 
with density 7r, i.e., n(x) = J n(y)p(y,x) dy for each x. In addition, we assume that (Xi) 
is geometrically ergodic, i.e., if P is the Markov kernel corresponding to p, then 

• P is ^-irreducible and aperiodic, and 

• there exists a function V : R — > [1, oo), constants A G (0, 1) and b G (0, oo), and a 
small set C C M such that the following drift condition is satisfied: 

PV(x) < XV(x) + bl c (x). (2) 

where PV(x) = f V(y)p(x, y) dy. 
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Pre cise definitions of the s e qua ntities/concepts can be found in lMeyn and Tweedid (119931 ) 



and Robert and Casella ( 2004L Ch. 6). For example, the drift condition can be found in 



equation (6.42) of Robert and Casella's book. 

In addition to the joint (X, Z) chain, we consider a dependent variable Y defined as 

Y t = niXi) + z h i> o. 

The goal is to estimate the regression function 77. For this, we shall consider a kernel 
estimator, namely the Nadaraya- Watson estimator, 



where K is a kernel density function and h n a bandwidth parameter. Following lAtchade 



( 120091 ) . we shall study convergence properties of 



1 n 

r Y^^i)K{{xo-Xi)/K) (3) 



nh n 



8=1 



for a generic function ip for fixed xq. Then properties of fj n (xo) can be deduced directly 
by considering ijj(y) = y for the numerator and i/j(y) = 1 for the denominator. 

3.2 Consistency result 

Before we state and prove the consistency theorem for the kernel regression estimator 
Vn,ip(xo) m ©, we first list our assumptions. For the function ip we assume 

sup ^ \u 2 E{\i>{Y)\ I X = x} < 00, sup-^-E{^(F) 2 | X = x} < 00. (4) 

x V [Xj x V \ Xj 

This means that growth of the conditional expectation, as a function of x, is somehow 
balanced by the growth of the drift function V(x). For the kernel K, in addition to being 
a density, we assume 

supK(x) < 00, lim |x|i^(x) = 0. (5) 

x x— >±oo 

The second condition implies that the distribution with density K has a finite mean. 
Finally, for the bandwidth h n we assume 

h n is of the order n~ 7 , for some 7 G (0, 1/2). (6) 

The following result is a variation on Theorem 2.1 in Atchadel ( 2009 ). He uses a 



different martingale array law of large numbers to prove almost su re convergence of 



f]r,.,ih(x n)\ another application of his law of large numbers appears in lAtchade and Fort 



(120101 ). With our law of large numbers (Proposition [T]) we get an L2 convergence result. 
Our conditions are also weaker than Atchade's in the sense that we do not need the kernel 
K to satisfy a Lipschitz condition, and our proof is arguably simpler. 

Proposition 2. With the setup and notation in Section l3J\ assume (j41), and (jH]). 
If x 1 — y ir(x)E{ip(Y) I X = x} is continuous at Xq, then 

Vn,i>(xo) -> 7r(x )E{^(F) I X = x }, in L 2 . 
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Proof. Following the setup in Atchadel ( 2009 ). for h > define 

F h {x, y) = K{(x - x)/h)^{y), f h (x) = K((x - x)Jh)E{ij J {Y) \X = x}, 

and 



9h(.x) 



£>0 



where P e = P(P^~ X ) are iterates of P and irfh = J fh{x)ii(x) dx. By (JJ|), boundedness of 
K, and geometric ergodicity of P, the function g^ is well-defined and \gh(x)\V(x)~ 1 ^ 2 is 
bounded. One can also show that satisfies the Poisson equation for fh and P, i.e., 

g h (x) - Pg h (x) = f h (x) - irf h . 

If Hh(x,y) = F h (x,y) + Pgh(x), then it is relatively easy to show, using basic properties 
of the Markov kernel P, that 

E{H h (X n , Y n ) | = x, F n _i = y} = Pgh{x) + nf h , 

and, therefore, 

F h (x, y) - nf h = H h (x, y) - E{H h (X n , Y n ) \ X n _ x = x, Y n _ x = y}. 
From here we may decompose fj n ^(x ) := (n/i n ) _1 Ya=i F h n (Xi, Yi) as 



1 1 n 

where D ni = H hn (Xi,Yi) - E{H hn (Xi,Yi) | and is the er-algebra generated by 

{(Xi, Yi) : i = 0, . . . , k}, k > 0. By © and (ED, we have 

nf hh = y j K((x - x)/h n )E{4>(Y) | X = x}tt(x) dx -> E{^(Y) | X = x }vr(x ); 

see, e.g., iDasGuptal ( 120081 . Theorem 32.1). Therefore, it remains to show that the 
(nhn)' 1 Y^i=i D n i — > in L 2 . Towards this, observe t hat (D n i) is a m artingale differ- 
ence array, so application of Proposition [TJ is possible. I Atchadel ( 120091 ) argues that, by 
(jl]), boundedness of K, and the drift condition, (D ni ) satisfies 

E(D ni ) < EiH^X^ 2 } < cE{V(*i)}, 

for c a constant independent of n and i. A simple but important consequence of the drift 
condition fl2]) is that the expectations on the right-hand side of the previous display are 
uniformly bounded in i by a constant that depends on A, b, and the distribution of Xq. 
This implies that E{D 2 li ) are uniformly bounded in (n, i) by some constant C . If we take 
u n = nh n and a n = nh^, then ot n — > oo by §6§ and 



££e(i&)<^ = 

" i=l 



Then it follows from Proposition [T] that (nhh) Y27=i D n i — > in L 2 . 



□ 
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4 Bayesian density estimation 



4.1 Notation and definitions 

Let (Y, e 3f) be a measurable space, and let Y\, . . . , Y n be independent Y- valued random 
variables having density / with respect to a cr-finite measure \x on <3f . Let F be a subset 
of all /i-densities /, and II a prior distribution on F. From Bayes' theorem, the posterior 
probability of A C F, given Yi, . . . , Y n , is given by 

J F 1L=i /OS) n(t//) 

Take the Hellinger distance H on F, f 2 ) = {J (fl /2 - / 2 1/2 ) 2 rf/i} 1 / 2 . 

Now take a non-Bayesian point of view and assume that there is a "true density" /* 
from which the data Y±, . . . ,Y n are observed. It shall be required that the prior II puts 
a sufficient amount of mass around this /*; see Section fl~2l With "true distribution" /*, 
it is typical to rewrite the posterior (J7|) as 

J ¥ R n {f)U{df) 

where R (f) = 1 and R n (f) = U7=i /0<)//*0i)> n > 1. In what follows, we will 
occasionally refer to the posterior U n , restricted to a given set A. By that we mean the 
measure IT^ defined as II^(-) = H n (A D -)/Il n (A). Also, < and > will denote inequality 
up to a universal constant. 

Convergence rates of the posterior distribution concerns the amount of probability 
assigned to shrinking neighborhoods of the true density /* as n — > oo. Let (e n ) be a pos- 
itive vanishing sequence. Then the posterior distribution II n has a Hellinger convergence 
rate e n if U n ({f : H (/*,/)> e n }) -t in probability. 

4.2 Prior support conditions 

In order for the posterior distribution to concentrate around /*, some support conditions 
on the prior II are needed. For example, if there exists a set A 3 f* such that 11(74) = 0, 
then, trivially, the posterior cannot concentrate around /*. To avoid these kinds of 
degeneracies, it is typical to assume that IT puts a sufficient amount of mass near /*. For 
this, let K(f*,f) = J log(/*//)/* dfi be the Kullback-Leibler divergence of / from /*, 
and V(f*,f) = f {\og(f* / f)} 2 f* d[i the corresponding second moment. 

Definition 1. Let (e n ) be a positive sequence such that e n — > and ne 2 n — > oo. Then /* 
is in the e n -support of the prior II if, for some constant C > 0, 

n({/ : K(f\ f) < el V(f\ f) < el}) > e~ c ^. (9) 

Intuitively, ()9]) means that the prior mass near f* is, in some sense, not t oo small. 



Note t hat is stronger than the standard Kullback-Leibler property used by ISchwartz 



(119651 ) . Beyond this intuiti on, the follow i ng tec hnical lemma, giving a lower bound on 



the denominator in (jSj). See iGhosal et al.l (120001 Lemma 8.1 



Lemma 1. Let I n = J R n (f) U.(df) be the denominator in (jSJ). If f* is in the e n -support 
of the prior II, then P(J n < e~ cne ™) — > for any c > C + 1 with C as in 
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4.3 Convergence rates for predictive densities 



Our devel opment here o n convergence rates of predictive densit ies is reminiscen t of The- 
orem 2 of Walker! ( 2003 ) and the result preceding Theorem 3 of Walker ( 2004a ). But the 
use of Proposition [TJ as opposed to classical martingale laws of large numbers, allows us 
to make conclusions about rates of convergence. 

The first step is to construct an appropriate martingale sequence. As before, let /„ 
be the denominator in (JHD - It is easy to see that 

I i /I i .. 1 = f i - 1 0Q/f*<Xi), i>h 

where fk(y) = f f(y) ^lk(df) is the posterior predictive density based on Y±, . . . , Y}.. Set 
T{x) = x 1 / 2 — 1, and write <3/ n for the a-algebra generated by Y±, . . . , Y n . It follows that 



E^/ii-i) i = -/{i- (f,-i/n l/2 }rd^ = -h(pj„ 



where h = H 2 /2 < 1 is a slight modification of the Hellinger distance. Then, clearly, 
the sequence (M„, with M n = Y^l=i^-i an d ^ = ^CA/^i-i) + ^(/*> /i-i) forms a 
martingale. In this case, the martingale difference array is a more familiar martingale 
differenc e sequence, bu t the result in Proposition [T] still applies. Here is a version of a 
result of iBarronl (119991 ). 



Proposition 3. For any a n — > oo, lete n = (a n ,/n) 1//4 . Iff* is in the e n - support ofH, then 
(ne 2 ) -1 YH=i h{f*i f%-i) i s bounded in probability. Equivalently, if f n = n _1 Ya=i fi-i is 
the average of predictive densities, then e~ 2 h(f*, f n ) is bounded in probability. 

Proof. The key to the proof is the fact that, for the martingale difference (Xi), 



E(xf\m-i)< m-i/f 



r*\l/2 



iyrdfjL = 2h(r,fi-i)- 

Since h < 1, we have E(X 2 | Y^i) < 2. Therefore, if u n = ne 2 n = (a n n) l l 2 , then 

2n 



Or 



1E e (^ 2 )^t = 2 = 0(1) ' n ^°°- 

" 1=1 

Hence, Proposition [1] applies with u n = ne 2 , so 

1 n 1 n 

— nu/ii-i) h (f*> °' in L2 - 

z — 4 net *■ — ' 



net 



8=1 



(10) 



i=l 



Following IWalkerl ( l2004al ). since arithmetic means are no smaller than geometric means, 



{iE(v/,-o 1/2 -i}>4(/y 2 »-i)>^.o g /, 



l r l 

72 



By Lemma[U the right-hand side is > — (C + l)/2, in probability. Since the first term in 
( FTOj) is lower bounded by a negative quantity, the second term, (ne 2 ) -1 YH=i ^(/*> /t-i)> 
must be upper bounded, in probability, proving the first claim. The second claim follows 
from the first and convexity of h. □ 
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In words, Proposition [3] states that if II is suitably concentrated around /*, then 
f n — > /*, in the Hellinger metric, at a roughly n _1//4 rate. The prior would have to 
be rather strange for this not t o imply convergence of the predictive density f n itself 



at the same rate ( Walkerl 120031 ). To understand why the rate is n _1//4 note that no 
assumptions about the smoothness of /* have been made. Suppose Y is M. d for some 
d > 1. Good convergence properties can be derived for nonparametric estimators when 
f* for log f*) is assumed t o be a-Holder, with a > d/2; see, e.g., Corollary 2.7.2 in 



van der Vaart and Wellnerl (119961 ) and the proceeding paragraph. Indeed, for such /*, 
the minimax rate of convergence is n - a /( 2a + d ) . At the boundary, with a = d/2, it is 
clear that the minimax rate is n~ 1,/4 , so, in some sense, our result describes a sort of 
worst-case sc enario. In fact, for Brownian motion p riors, whose paths are continuous but 



not smooth, Ivan der Vaart and van Zantenl (120081 ) show that the rate of convergence is 
exactly n -1 / 4 . 

4.4 Posterior behavior away from /* 

The previous subsection looked at F as a whole. Here, the goal is to investigate the 
behavior of the posterior probabilities U n (A n ) when the sets A n are not too close to /*. 
To start, we will first construct the appropriate martingale. 

Given a sequence (A n ) of measurable subsets of F, recall that f^ n denotes the predic- 
tive distribution of l^+i, given Yi, . . . , Y%, k = 1, . . . , n, when IT n is restricted to A n . Let 
L n ,i — J A Ri{f) n(d/) be the numerator of Ui(A n ) in (jSJ), % < n. Then it is clear that 

L n>i I L M _! = ft\{Yi) I TO, i = 1, . . ■ , n. 

For T(x) = yjx- 1 as in Section S31 E{T(L re)i /L nji _i) | = -h(f* There- 

fore, X ni i = T(L nj i/L n j-i) + h(f*, /jl") is a martingale difference array and the limiting 
behavior of M n>n = YH=i X n ,i can b e studied using Proposition [TJ 

Towards finding posterior convergence rates, the following preliminary result will be 
useful. It resembles Theorem 1 of Walker fl2004ah . but with information about rates. 



Proposition 4. Given a n — > oo, set e n = (an/n) 1 ^. If f* is in the e n -support ofU, and 

1 - - C + 1 

liminf — -N h(f*,f i J\)> — - — , in probability, (11) 



n^oo net, 

n i=i 



where C is as in Q), thenH n (A n ) — > in probability. Specifically, U n (A n ) < U(A n )e un£n , 
for some v > 0, with probability tending to 1. 

Proof. Fix u n = ne 2 n . The same argument as in the proof of Proposition [3] shows that 

-i n i n 

— T(L nsi / L n|i _i) + — Ht, fh) ~+ 0, in probability. (12) 

" 1=1 " 1=1 

From ( TTTj) we can conclude that 

1 v w ^ C + 1 
limsup — - r(L n> j/L nii _i) < — , in probability. 

" «=1 
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Another arithmetic-to-geometric means comparison gives 



hm sup - — - log - — < hm sup — < I - — I — 1 > < 



Since L n = U.(A n ), it follows that L n , n < n(v4 n )e~ dne ™ in probability, for some d > C + l. 
Similarly, I n > e~ cnEn in probability for any c e (C+l,d). Therefore, U n (A n ) = L n>n /I n < 
Ii(A n )e- {d - c)n£ ^ ^ in probability since d > c and ne 2 n — > oo. □ 

Proposition H] captures the essence of how the posterior behaves away from /*: if the 
prior is sufficiently concentrated around /*, then sets A n which, somehow, do not get 
too close to f* have vanishing posterior probability. And this result holds without any 
explicit global conditions on the prior — only local support conditions. 

It is straightforward to extend Proposition |4] to a finite collection of sequences, say, 
(A n j), where n > 1 and j = 1, . . . , J for fixed finite J. In that case, 

j 

Tl n (A nl U • • ■ U A nJ ) < n n (A nj ) -> in probability. 

3=1 

Since J is fixed and finite, we cannot reach any formal posterior convergence rate results 
on this path. However, it does give us some stronger intuition about the behavior of Il n . 
For example, take A n j to be a Hellinger ball with radius increasing with n and center f n j 
moving away from /* in such a way that fllip holds for each j = 1, . . . , J. If we take J 
to be very large, then, in some sense, the union A n \ U • • • U A n j of these expanding balls 
almost fills up the space outside the collapsing neighborhood of /*. Then the previous 
display gives a sort of posterior convergence rate result. 

Next, take A n = A fix ed. Th e n UJ A) -> if {ne 2 n )- 1 $™ =1 h(f\ ff_A is bounded 
sufficiently far from zero. Walker ( 2003| ) reaches the same conclusion based on the as- 



sumption that h(f*, fn) is bounded away from zero. Since e n — > 0, the condition here is 
weaker than Walker's condition, meaning that II„(A) — > for a wider class of sets A. 

Our calculations thus far have all been limited to a n -1 / 4 rate, but in many cases 
n~ l l 2 is possible. It turns out we can lift restriction can be lifted by choosing appropriate 
A n . Indeed, the rates can be improved if the sets A n are not allowed to wander too far 
from /*. This allows us to choose e n within a small factor of the optimal rr x l 2 . 

Proposition 5. Given a n — > oo, set e n = (a n /nY' 2 . If f* is in the e n -support ofU and, 
in addition to ([IT]) . 

1 n 

-J2HKf*,fh)}< De 2 j2, (13) 

i=l 

for some D > C + 1, then the conclusion of Proposition^ holds for this smaller e n . 

Proof. The proof is exactly like that of Proposition 2] as long as ( I12p obtains with the 
less-rapidly growing u n = ne 2 n = a n . By the martingale array construction described 
above, (a n /u 2 ) Y%=i E ( X l,i) ^ ( 2 / ne l) EILi E {^(/*> ff-i)}- According to assumption 
(113"]) . this last expression is bounded by D, so ([T2l obtains by Proposition [TJ □ 
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Condition f fT3|) for example, if A n is a Hellinger ball of radius e n /2 centered between 
e n and 2e n Hellinger units from /*, then, by the triangle inequality and convexity of A n 
and h, ffTBl holds with D = (5/2) 2 . Balls such as these are useful for covering shells of 
the form {/ : ke n < H(f*, f) < (k + l)s n } for positive k. 

Of course, if ((HJ or both dHJ) and CE} holds for A n = {f : #(/*, /) > £.„}, then our 
results thus far would provide a formal posterior convergence rate result. Unfortunately, 
these conditions are not easily verified when the sets A n are not convex. In general, 
this martingale approach seems to fall short of giving a full posterior convergence rate 
result. The primary obstacle seems to be simultaneous handling of several sequences of 
sets, i.e., (A n j) with n > 1 and j = 1, . . . , J n where J n is either infinity or J n — > oo as 
n — > oo. Somehow, a link is needed between the martingales associated with each of 
these sequences, so that the conclusion of Proposition H] can be made uniform in these 
sequences. Naive uniformization requires unrealistically strong conditions, so something 
more clever is needed. 
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